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ABSTRACT 



In a distributed heterogeneous computer system having 
a plurality of computer nodes each operatively con- 
nected through a network interface to a network to 
provide for communications and transfers of data be- 
tween the nodes and wherein the nodes each have a 
queue for containing jobs to be performed, an improve- 
ment for dynamically reallocating the system's re- 
sources for optimized job performance. There is first 
logic at each node for dynamically and periodically 
calculating and saving a workload value as a function of 
the number of jobs on the node's queue. Second logic is 
provided at each node for transfering the node's work- 
load value to other nodes on the network at the request 
of the other nodes. Finally, there is third logic at each 
node operable at the completion of each job. The third 
logic includes, logic for checking the node's own work- 
load value, logic for polling all the other nodes for their 
workload value if the checking node's workload value 
is below a preestablished value indicating the node as 
being underutilized and available to do more jobs, logic 
for checking the workload values of the other nodes as 
received, and logic for transfering a job from the queue 
of the other of the nodes having the highest workload 
value over a preestablished value indicating the other of 
the nodes as being overburdened and requiring job 
relief to the que of the checking node. The third logic is 
also operable periodically when the node is idle. 

16 Claims, 3 Drawing Sheets 
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puter systems according to the known prior techniques 

DYNAMIC RESOURCE A LLOC ATION SCHEME for load distribution and redistribution. Their finding 

FOR DISTRIBUTED HE TERO GENEOUS will now be set forth by way of example to provide a 

COMPUTER SYSTEMS clear picture of the background and basis for the present 

5 invention. 

ORIGIN OF THE INVENTION The imbalance probability, IP, for a heterogeneous 

The invention described herein was made in the per- system can be calculated by mathematical techniques 

formance of work under a NASA contract, and is sub- well known to those skilled in the art which, per se, are 

ject to the provisions of Public Law 96-517 (35 USC no part of the novelty of the present invention. There is 

202) in which the Contractor has elected not to retain 10 a finite, calculatable probability that I out of N comput- 

title. ers comprising a networked system are idle. There is 

„„ also a finite probability that all stations other than those 

TECHNICAL FIELD j stations are busy, as well as a probability that there is 

The invention relates to resource allocation in com- exactly one job in each one of the remaining (N-I) sta- 

puter systems and, more particularly, to a method and 15 tions, i.e. a finite probability that at least one out of (N-I) 

associated apparatus for shortening response time and stations has one or more jobs waiting for service. By 

improving efficiency of a heterogeneous distributed summing over the number of idle stations, from I to N, 

networked computer system by reallocating the jobs the imbalance probability for the whole system can be 

queued up for busy nodes to idle, or less-busy nodes. In obtained. By way of example, in a homogeneous system, 

accordance with a novel algorithm, the load-sharing is 20 all the nodes (i.e. computers 12) have the same service 

initiated by the. server device in a manner such that rate and the same arrival rate. As the number of nodes 

extra overhead is not imposed on the system during increases, the peak of the imbalance probability goes 

heavily-loaded conditions. higher. As the number of nodes increases to twenty, the 

BACKGROUND ART 25 * m * )a ^ ancc probability approaches I when the traffic 

intensity (arrival rate divided by the service rate at each 

In distributed networked computer systems there : is a nodc) ^ from 40% t o 80%. The statistical curves 

high probability that one of the workstations will be idle ^ mdicate that the probabil j ty of imbalance is high 

while others are overloaded Thus, the response times dufi moderate traffic intensity. This occurs due to the 

for certam tasks are longer than 1 they should be if all the fact that ^ nodes m ehher idk (i e there is low traffic 

capabilities m the system could be shared fuUy As is 30 Qf bus traffic intensity), 

known in the art, the solution is to reaUocate tasks from Jf ^ not evenl dis T ri buted, the imbal- 

queues connected to busy computers to idle computer ^ probabUity beeoam cven mghcn In ^ taW|Bfie 

depicted in FIG. 1, a distributed computer system Probability of a two-node heterogeneous system the 

- A ; *V - 1 i*y tv,o ™ faster node is twice as fast as the slower one and the 

10 consists 01 several computers 12 with the same or •» . , . A A , Tr . . . x . . , 

different processing capabilities, connected together by w ° rk £ eve, ? y d * X ^*t lf . lh *? 0Tk 18 

a network 14. Each of the computers 12 has tasks 16 "has been observed that the imbalance probability goes 

assigned to it for execution. In such a distributed multi- eve » ^her during high traffic intensity at the slower 

computer system, the probabUity is high that one of the node - At this point, the slower node is heavily loaded 

computers 12 is idle while another computer 12 has 40 even though the faster node is only 50% utilized, 

more than one task 16 waiting in the queue for service. Numerous studies have addressed the problem of 

This probability is called the "imbalance probabUity". resource-sharing in distributed systems. It is convenient 

A high imbalance probabUity typically implies poor t0 classif y these strategies as being either static or dy- 

system performance. By reallocating queued tasks or namic in naturc as having either a centralized or 

jobs to the idle or lightly-loaded computers 12, a reduc- 45 decentralized decision-making capability. One can fur- 

tion in system response time can be expected. This tech- ther distinguish the algorithms by the type of node that 

nique is called "load sharing" and is one of the main foci tne initiative in the resource-sharing. Algorithms 

of this invention. As also depicted in FIG. 1, such redis- can either be sender-initiated or server-initiated. Some 

tribution of the tasks 16 on a dynamic basis is known in algorithms can be adapted to a generalized heteroge- 

the art Typically, there is a control computer 18 at- 50 neous system whUe others can only be used in a homo- 

tached to the network 14 containing task lists 20. On geneous system. These categories are further explained 

various bases, the control computer 18 dynamically as follows: 

reassigns tasks 16 from the lists 20 to various computers Static/Dynamic: Static schemes use only the infor- 
12 within the system 10. For example, it is known in the mation about the long-term average behavior of the 
art to have each of the computers 12 provide the con- 55 system, i.e. they ignore the current state. Dynamic 
trol computer 18 with a indicator of the amount of schemes differ from static schemes by determining how 
computing time on tasks that is actually taking place. and when to transfer jobs based on the time-dependent 
The control computer 18, with knowledge of the current system state instead of the average behavior of 
amount of use of each computer 12 avaUable, is then the system. The major drawback of static algorithms is 
able to reallocate the tasks 16 as necessary. In mUitary 60 that they do not respond to fluctuations of the work- 
computer systems, and the like, this ability to reconfig- load. Dynamic schemes attempt to correct this draw- 
ure, redistribute, and keep the system running is an back but are more difficult to implement and may intro- 
important part of what is often referred to as "graceful duce additional overhead. In addition, dynamic 
degradation"; that is, the system 10 continues to operate schemes are hard to analyze. 

as best it can to do the tasks at hand on a priority basis 65 Centralized/Decentralized: In a system with central- 

for as long as it can. . ized control, jobs are assumed to arrive at the central 

The inventors herein did a considerable amount of controller which is responsible for distributing the jobs 

statistical analysis and evaluation of networked com- among the network's nodes; in a decentralized system, 
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jobs are submitted to the individual nodes and the deci- showed that schemes that use relatively simple state 

sion to transfer a job to another node is made locally. information do very well and perform quite closely to 

This central dispatcher approach is quite restrictive for the optimal expected performance. The system configu- 

a distributed system. ration studied by Eager, et al. was also a homogeneous 

Homogeneous/Heterogeneous system: In the homo- 5 system. Towsley and Lee [Tows 86] used the threshold 

geneous system, all the computer nodes are identical of the local job queue length at each host to make deci- 

and have the same service rate. In the heterogeneous sions for remote processing. This computer system was 

system, the computer nodes do not have the same pro- generalized to be a heterogeneous system, 

cessing power. In summary, most of the work reported in the litera- 

Sender/Server Initiated: If the source node makes a 10 ture has been limited to either static schemes, central- 
determination as to where to route a job, this is defined ized control, homogeneous systems, or to two-proces- 
as a sender-initiated strategy. In server-initiated strate- $or systems where overhead considerations were ig- 
gies, the situation is reversed, i.e., lightly-loaded nodes nored. All of these approaches make assumptions that 
search for congested nodes from which work may be ^ too restricted to apply to most real computer system 
transferred. 13 installations. The main contribution of this reported 

The prior art as discussed in the literature (see Listing ^ ^ development of a dynamic, decentralized, 

of Cited References hereinafter) will now be addressed resource-sharing algorithm for a heterogeneous multi- 

with particularity. pl c q Ct grca ter than two) processor system. Because it 

First, there are the static strategies. Stone [Ston 78] ^ server-initiated, this approach thus differs signifi- 
developed a centralized maximum flow algorithm for 20 camly from thf sender-initiated approach described in 
two processors (i.e. computer nodes) by holding the rjows 861. The disadvantage of this prior art server- 
load of one processor fixed and varying the load on the approach is that it cx tra overhead in 
other processor. Ni and Hwang [Hwan 81] studied the ^ heavily _ loadcd situation and therefore, it could 
problem of load balancing m a multiple heterogeneous bhn thc % tQ m unstaWc sUtc 
processor system with many job classes. In this system, 25 

the number of processors was extended to more than LIST OF CITED REFERENCES 

two. Tantawi and Towsley [Tant 85] formulated the ^ g5] B K , M md Wah , B> w . ^ 

static resource-sharing problem as a nonhnear program- ^ Qf ^ ^ncmg on Response Time for Local 

ming problem and presentee two ^ Computer Systems with a Multiaccess Network," 

theparametnc-study^gonto International Comm. Conf. 1985, pp. 

problem. Silva and Gerla [Sdv 84] used a downhill _ KK 

queueing procedure to search for the static optimal job 'W^w,, v n «„h jtm^ u/ u »\jr^-ic 

Lignment in a heterogeneous system that supports , ^how 7 ?1 Ch ° W ' Y* C and K ° hI f r ' W ' H 

muK job classes aid site constrains, Recently, for dynamic L*ad Balancuig in ^ Heterogeneous Mult - 

Kurose and Singh [Kuro 86] used an iterative algorithm 35 &?Z°T ^f^i 5 To^ Cm ** m > V ° h . 

to deal with the static decentralized load-sharing prob- Cm }*> No - j 5 > PP- 34 "? 1 ' T May 19 / 9 - c ^ 

Iem. Their algorithm was examined by theoretical and . IBjge *fl Eager D L Lazowska, E. D., and Zahor- 

simulation techniques 1™* J - Adaptive Load Sharing in Homogeneous Dts- 

Next, there are the dynamic strategies. Chow and tributcd Systems," IEEE Trans, on Software Eng., Vol. 

Kohler [Chow 79] used a queueing theory approach to 40 S E" 12 > N °* 5 > Mav J! 9 * 6 - f „ ^ _ , 

examine a resource-sharing algorithm for a heteroge- IP*** «51 Eager, D. L , Lazowska, E. p., and Zahor- 

neous two-processor system with a central dispatcher. M J - " A Comparison of Receiver-Initiative and Send- 

Their objective was to minimize the mean response cr-Initiative Dynamic Load Sharing," Tech Report No. 

time. Foschni and Salz[Fosc 79] generalized one of the 85-04-01, Dept. of Computer Science, University of 

methods developed by Chow and Kouler to include 45 Washington, April 1985. 

multiple job dispatchers. Wah [Wan 84] studied the t Fisc 7 *1 Foschini, G. J. and Salz, J. "A Basic Dy- 

communication overhead of a centralized resource- Routing Problem with Diffusion," IEEE Trans, 

sharing scheme designed for a homogeneous system. Commun., Vol. Com-26, pp. 320-327, March 1978. 

Load-balancing of the Purdue ECN (Engineering Com- [Huan 82] Hwang, K. and Wah, B. "A UNIX-Based 
puter Network) was implemented with a dynamic de- 50 Local Computer Network with Load Balancing," 

centralized RXE (remote execution environment) pro- IEEE Computer, April 1982. 

gram [Hawn 82]. With the decentralized RXE, the load [Hwan 81] Hwang, K. and Ni, L. M. "Optimal Load 

mforiMtionofalltlieprt>c€ssonwa Balancing Strategies for a Multiple Process System," 

network machine's kernel. One of the problems with Proc. of Intel. Conf. on Parallel Processing, August 
this approach is the potentially high cost of obtaining 55 1981. 

the required state information. It is also possible for an [Hwan 82] Hwang, K. and Croft, W. J., et al. "A 

idle processor to acquire jobs from several processors UNIX-Based Local Computer Network with Load 

and thus become overloaded. Ni and Xu [Ni 85] pro- Balancing," IEEE Computer magazine, April 1982. 

pose the "draft" algorithm for a homogeneous system. [Kari 85] Karian, Z. A. "GPSS/PC: Discrete-Event 
Wah and Juang [Wah 85] propose a window control 60 Simulation on the IBM PC," Byte, October 1985. 

algorithm to schedule the resource in local computer [Klei 75] Kleirock, L. "Queueing System", Vol I: 

systems with a multi-access network. Wang and Morris Theory John Wiley & Sons, 1975. 

[Wang 85] studied ten different algorithms for homoge- [Kuro 86] Kurose, J. and Singh, S. "A Distributed 

neous systems to evaluate the performance differences. Algorithm for Optimal Static Load Balancing in Dis- 
Eager, et al. [Eage 86] addressed the problem of decen- 65 tributed Computer Systems," IEEE Infocom, April 

tralized load sharing in a multiple system using- dynam- 1986. 

ic-state information. Eager discussed the appropriate [Ni 81] Ni, L. M. and Hwang, K. "Optimal Load 

level of complexity for load-sharing policies and Balancing Strategies for a Multiple Processor System," 
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Proc. Tenth Intl Conf. Parallel Processing, pp. capability. One can further distinguish the algorithms 

352-257, August 1981. employed by the type of node that takes the initiative in 

[Ni 85] Ni, L. M. "A Distributed Drafting Algorithm the resource-sharing. Algorithms can be either sender- 

for Load Balancing," IEEE Trans, on Software Eng., initiated or receiver-initiated. Some algorithms can be 

Vol. SE-1 1, No. 10, October 1985. 5 adapted to a generalized heterogeneous system while 

[Silv 84] Silva, E. S. and Gerla, M. "Load Balancing others can be used only in a homogeneous system, 

in Distributed Systems with Multiple Classes and Site These categories are further addressed with respect to 

Constraints", Performance 84 (North Holland), pp. the above-referenced prior art patents as follows. 

17-33, 1984. Centralized/Decentralized: In a system with central- 

[Ston 78] Stone, H.S. "Critical Load Factors in Two- 10 utd control (as shown in FIG. 1) jobs arrive at the 

Processor Distributed Systems," IEEE, Trans, of Soft- central control computer 18 which is responsible for 

ware Engineering, Vol. SE-4, No. 3, pp. 254-258, May distributing the jobs among the network nodes. In a 

1^78. decentralized system, jobs are submitted to the individ- 

flows 86] Towsley, D, and Lee, K. J. "A Compari- ^ nodes ^ the decision to transfer a job to another 

son of Priority-Based Decentralized Load Balancing 15 node is made locally. The central dispatcher approach is 

Policies," ACM performance ^Evaluation Review, Vol. quite rest rictive for a distributed system. In the teach- 

14,No. 1, pp. 70-77, May 1986. m of their patent, Kitajima, H. and Ohmachi assign a 

(Tows] Towsley, D. and Lee, K. J. On the Analysis proccssing requcst allocator to be the single controller 

of a D^tralized Load Sharing Policy m Heteroge- of ^ ccntrali2ed ^hcme. One of the problems with 

Tr^^r t0 this approach is the p° temialI y W « h cost of obtainin 8 

r^ m Po^i> ; . A xt * m , ^ *• t the required state information. It is also possible for an 

[Tant 85] Tantawi, A. N. and Tows ley, D. "Optimal ^ ? ocessor ^ jobs frQm ^ SSOTS 

?acm^« nTT^^ and thus become overloaded. 

M [?riv ^^t^i^^^ with 25 Homogeneous/Heterogeneous system: In a homoge- 

Relkbility, QueuT^g Jd^mpZtScLcc Applica- ™ P* 1 ** * com P utcr n ?des m ust be identical and 

tions," I^-Hal? Inc., 1982 ? PP * avc * he "~ ™* ra f heterogeneous sys- 

[Wah 85] Wah, B. and Lien,, Y. N. "Design of Dis- f era > the com P* er nod " ?° * ot ^ve the same process- 

tributed Database on Local Computer Systems with a m * P™ er - * he £ P ate f Fr * f ftmpy. H. O., 

Multi-Access Network", IEEE Trans. Software Engi- 30 «"* Klttm S er ' *; disclose a scheme to balance data- 

neering. Vol. SE-11, No. 7, July 1985. processmg workloads on a homogeneous environment. 

[Wah 85] Wah, B. and Yuang, J. Y. "Resource Sched- Sender/Receiver Initiated: If the source node makes 
uling for Local Computer Systems with a Multi-Access * determination as to where to route a job, this is de- 
Network," IEEE Trans, on Computers, Vol. C-34, No. fmed M a sender-initiated strategy. In receiver-imtiated 
12 December 1985. 35 strategies, the situation is reversed, i.e., lightly-loaded 

[Wang 85] Wang, Y. T. and Morris, R. J. T. "Load nodes search for congested nodes from which work 

Sharing in Distributed Systems," IEEE Trans, on Com- mav transferred. In their patent, Hoschler, H., Rai- 

puters, Vol C-34, pp. 204-217, March 1985. mar » w - md Brandmaler disclose a sender-initiated 

The foregoing articles and reports from the literature scheme. The inventors herein have proved that the 

are only generally relevant for background discussion 40 receiver-initiated approach is superior at medium to 

purposes and, since copies are not readily available to w 8 h loads and » therefore, have incorporated such an 

applicants for filing herewith, they are not being pro- approach in their invention in a novel manner, 

vided. In addition to the foregoing non-supplied articles Static/Dynamic: Static schemes use only the infor- 

from the literature, however, copies of the following nation about the long-term average behavior of the 

relevant U.S. Utters Patent are being provided here- 45 system, i.e. they ignore the current state. Dynamic 

schemes differ from the static schemes by determining 

[I] Hoschler, H. f Raimar, W. ( and Bandmaler, K. how and when to transfer jobs based on the time- 
method of Operating a Data Processing System " U.S. dependent current system state instead of the average 
Pat No. 4,099,235, July 4, 1978. behavior. The major drawback of static algorithms is 

[2] Kitajima, H. and Ohmachi, K. "Processing Re- 50 that they do not respond to fluctuations of the work- 
quest Allocator for Assignment of Loads in a Distrib- load. Dynamic schemes attempt to correct this draw- 
uted Processing System," U.S. Pat. No. 4,495,570, Jan. back. 

22 »l 9 £ 5 ' ewxr Urt a STATEMENT OF THE INVENTION 
[3] Fry, S. M., Hempy, H. O., and Kittinger, B. E. 

"Balancing Data-Processing Workloads", U.S. Pat. No. 55 Accordingly, an object of the invention is the provid- 

4,403,286, Sept 6, 1983. ing of a dynamic decentralized resource-sharing algo- 

With respect to the above-listed U.S. Patents and the rithm for a heterogeneous multi-processor system, 

teaching thereof vis-a-vis the present invention to be It is another object of the invention to provide a 

described hereinafter, the inventors herein have in- dynamic decentralized resource-sharing algorithm for a 

vented a new dynamic load-balancing scheme for a 60 heterogeneous multi-processor system which is receiv* 
distributed computer system consisting of a number of er-initiated in heavy load so that it does not impose 
heterogeneous hosts connected by a local area network extra overhead in the heavily-loaded situation and, 

(LAN). As mentioned above, numerous studies have therefore, will not bring the system to an unstable state, 

addressed the problem of resource-sharing in distrib- Another object of the present invention is to prevent 
uted systems. For purposes of discussion and distin- 65 an idle node in a heterogeneous multi-processor system 
guishing, it is convenient to classify these strategies as from becoming isolated from the resource-sharing pro- 
being either static or dynamic in nature and as having cess, as can happen with the systems of Fry and 
either a centralized or decentralized decision-making Kitajima by providing a wakeup timer used at each idle 
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node to periodically cause the idle node to search for a for short), will now be presented. A queueing analysis 

job that can be transferred from a heavily-loaded node. of the algorithm's performance is presented and the 

Still another object of the present invention is to use result is validated by the reporting of simulation results, 

the local queue length and the local service rate ratio at The basic environment is as depicted in FIG. 2; that is, 

each node as a more efficient workload indicator. 5 there are a plurality of computer nodes 12' distributed 

It is yet a further object of the invention to provide a across a network 14 without the need for a control 
dynamic decentralized resource-sharing algorithm for a computer 18 as in the prior art of FIG. 1. 
heterogeneous multi-processor system which dynami- As observed earlier herein, most of the work of the 
cally adjusts to the traffic load and does not generate prior art has been limited to either static schemes, cen- 
extra overhead during high traffic loading conditions 10 tralized control, or homogeneous systems. All of these 
and, therefore, cannot bring the system to an unstable approaches make assumptions that are much too res trie- 
state, tive to apply to most real computer system installations. 

The foregoing objects have been achieved in a dis- The main contribution of the present invention is the 
tributed heterogeneous computer system having a plu- providing of a dynamic, decentralized, resource-sharing 
rality of computer nodes each operatively connected 15 algorithm for a heterogeneous, multi-processor, corn- 
through a network interface to a network to provide for puter system (such as that generally indicated as 10' in 
communications and transfers of data between the FIG. 2). The algorithm employed in the present inven- 
nodes and wherein the nodes each have a queue for tion uses a dual-mode, server-initiated approach which 
containing jobs to be performed, by the improvement of is clearly novel over anything taught or suggested by 
the present invention for dynamically reallocating the 20 the prior art. Jobs are transferred from heavily bur- 
system's resources for optimized job performance. dened nodes (i.e. over a high threshold limit) to low 
There is first logic at each node for dynamically and burdened (or idle) nodes at the initiation of the receiv- 
periodically calculating and saving a workload value as ing node when (1) a job finishes at a node which is 
a function of the number of jobs on the node's queue. burdened below a pre-established threshold level or (2) 
Second logic is provided at each node for transfering 25 a node has been idle for a period of time as established 
the node's workload value to other nodes on the net- by a wakeup timer at the node. The important advan- 
work at the request of the other nodes. Finally, there is tage of this approach is that, unlike the prior art ap- 
third logic at each node operable at the completion of proaches, it does not impose extra overhead in the 
each job. The third logic includes, logic for checking heavily-loaded situation. Therefore, it cannot bring the 
the node's own workload value, logic for polling all the 30 system to an unstable state. The algorithm also has two 
other nodes for their workload value if the checking important advantages over the prior art. First, to pre- 
node's workload value is below a pre-established value vent an idle node from becoming isolated from the 
indicating the node as being underutilized and available resource-sharing process, the wakeup timer is included, 
to do more jobs, logic for checking the workload values As will be addressed in greater detail shortly, the 
of the other nodes as received, and logic for transfering 35 wakeup timer is used at each idle node to periodically 
a job from the queue of the other of the nodes having cause the idle node to search for a job that can be trans- 
the highest workload value over a pre-established value ferred from a heavily-loaded node. Second, this inven- 
indicating the other of the nodes as being overburdened tion uses a combination of the local queue length and 
and requiring job relief to the queue of the checking the local service rate ratio at each node as the workload 
node. The third logic is also operable periodically when 40 indicator. In a heterogeneous computer system, it is 
the node is idle. more efficient to use this workload indicator rather than 

Other objects and advantages of the present invention just the local queue length as employed in the prior art. 

will become apparent from the description which fol- It was determined by the inventors herein that an 

lows hereinafter when taken in conjunction with the ideal resource-sharing algorithm should have the fol- 

d rawing figures which accompany it. 45 lowing characteristics: 

^ron.^^ ^ * 0 Dynamic: the load distribution should adapt to 

BRIEF DESCRIPTION OF THE DRAWINGS rap id s/stem load changes. 

FIG. 1 is a simplified block diagram of a prior art 2) Decentralized: each processing computer node 
distributed computer system employing a control com- should determine, on its own, whether to process a job 
puter to distribute and redistribute the tasks among the 50 locally or to send the job to some other node for pro- 
computer nodes on the. network. cessing. There is no need for a central dispatcher. Since 

FIG. 2 is a simplified block diagram of a distributed the central dispatcher is not required, the problem of a 

computer system according to the present invention. potential single point of failure is eliminated. 

FIG. 3 is a functional block diagram of a computer 3) Server-initiated: a good scheme should only re- 

from the system of FIG. 2 pointing out the portions 55 quest job relocation when there are idle processors 

thereof related to the present invention. available to serve the relocated jobs. By using the serv- 

FIG. 4 is a functional block diagram of the system of er-initiated approach, the danger of sender-initiated 
FIG. 2 showing the is which tasks are dynamically schemes, i.e. that all the nodes are overloaded and each 
reallocated accord dual mode approach of the present attempts to give away jobs causing unproductive over- 
invention. 60 head which simply further saturates the already over- 

FIG. 5 is a flowchart of the logic of the method of the loaded system, is eliminated, 

present invention. The following is an outline of the basic SIDA algo- 

DETAILED DESCRIPTION OF THE rit G ^ E ^_° mpUter *"* ° f { ° m: 

INVENTION 65 n = Number of total nodes in the heterogeneous system; 

The new and novel resource-sharing algorithm of the H= Service rate of node i; 

present invention, which the inventors call the Server- Q= Queue length at node i. 

Initiated Dynamic Resource-Sharing Algorithm (SIDA WHILE— 
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{[(a job completes at node i) or wakeup-timeout] and this prior art assumption and teaching, however, for 

[Q/H ^ LOW-THRESHOLD]} SID A it is necessary and preferred to assign the highest 

DO priority level to the resource-sharing process. This is 

probe Q for k-1 to N, k=i; because, otherwise, the lightly-loaded nodes 12' could 

identify the node, j, with the MAX(Q ); 5 not receive the workload indicators 30 from the other 

if Q /H ^ HIGH-THRESHOLD then nodes 12' upon which to base a decision and jobs would 

DO never be transferred from the heavy-loaded nodes 12' in 

transfer 1 job from node j to node i; a timely manner, i.e. before the heavily-loaded nodes 12' 

(*job is processed at node i and the results are returned become lightly-loaded. 

EITO dej * ) 10 VALIDITY TESTING RESULTS 

END. The present invention and its novel algorithm were 

The environment of the present invention is shown in verified by simulation modeling. A multiple processor 

greater detail in FIGS. 3 and 4 while the dual mode heterogeneous system was considered in which the 

algorithm logic is shown in flowchart form in FIG. 5. IS service rates of the nodes are not necessarily identical. 

Each computer node 12' includes a task queue 22. Point- Each node was modeled as a queueing center. For a 

ers are provided to the top of queue (TOQ), end of particular station m, new jobs were assumed to arrive at 

queue (EOQ), and bottom of queue (BOQ). The length rate r m . The average service rate was s m and the inter- 

of the active contents of the queue 22 at any time can be wakeup time, which was the same for all nodes, was 

determined by subtracting TOQ from BOQ. The per- 20 1/r* The state of the system was defined to be the num- 

centage of the active contents of the queue 22 at any ber of jobs in a node, either in the queue or being served, 

time is, of course, easily calculated as BOQ— TOQ- The objective of the performance analysis was to deter- 

/EOQ— TOQ. As depicted in FIG. 3, the operating mine the effects of resource-sharing according to the 

system 24, for example, can provide a service rate 26 for method of the present invention on the average system 

the node 12', i.e. the ratio (percentage) of computational 25 response time. These effects are a function of the traffic 

usage of the node 12' compared with its potential. The intensity, which is defined as the ratio of the job arrival 

SIDA as incorporated into the block labelled TASK rate to the job service rate. 

ALLOCATION AND TRANSFER LOGIC 28, uses The basic assumptions made in this performance 

the length of the local queue 22 and the local service study were as follows: 

rate 26 at each node 12' as the workload indicator 30 for 30 a) There is only one class of tasks. The task arrival 
that node 12'. When a job finishes at a node 12', the rate to each processor is exponentially distributed. The 
logic 28 of the node 12' checks the workload indicator arrival rates at each processor may be different. 
30 obtained by dividing the queue length by the service b) The resource service rates are exponentially dis- 
rate. If the workload indicator 30 is less than a certain tributed and may be different for each resource proces- 
low threshold level, the lightly-loaded node 12' initiates 35 sor. 

a search for the most busy node 12'. c) The average inter-wakeup rate r w for each idle 

As depicted in FIG. 4, in the system 10' of the present node is the same and is exponentially distributed, 
invention, the task allocation and transfer logic 28 of d) Each job needs only one resource, 
each node 12' is connected to the network 14 through a e) The network transmission delay in propagating 
network interface 32. Thus, as configured, the task 40 requests, probing the status of other nodes, and return- 
allocation and transfer logic 28 of each node 12' can ing results is negligible. This assumption is valid if the 
access the last calculated workload indicator 30 of all transmission bandwidth is large compared to the traffic 
the other nodes 12' on the network 14. To accomplish a on the network. 

search for the most busy node 12', the task allocation 0 All processing overhead for probing, packing and 

and transfer logic 28 of a node 12' simply requests the 45 unpacking data, request and result transfer are ignored, 

workload indicators 30 of all the other nodes 12' on the This assumption is valid if the processing required to 

network 14. If the workload indicator 30 of the most pack and unpack the job is significantly less than the 

busy node 12' is above a certain high threshold, a job processing required to process the job. 

from the task queue 22 of that busy node 12' is trans- g) For simplicity, only the queue length was used as 

ferred to the lightly-loaded node 12'. To prevent an idle 50 the workload indicator. 

node 12' from becoming isolated from the resource- h) Lightly-loaded nodes polled jobs from any heavi- 

sharing process, the wakeup timer 34 is used at each idle ly-loaded node rather than selecting only the most 

node 12' to periodically cause the idle node 12' to search heavily-loaded one. 

for a job which can be transferred from a heavily loaded i) The low threshold is assumed to be 0. In other 

node 12'. Thus, SIDA provides a method which bal- 55 words, a node tries to poll a job from other nodes while 

ances the workload among the nodes 12', resulting in a it is idle. 

beneficial and substantial improvement in the response j) The high threshold is assumed to be 2. 

time and throughput performance of the total system. It Exact analysis of the algorithm was quite complex 

should be noted that SIDA adjusts dynamically to the since one is faced with an N-dimensional Markov chain, 

traffic load qf each node 12'. When the workload indi- 60 The inventors employed a simplified approximate 

cator of every node 12' is greater than a certain thresh- model which they proved to their, satisfaction to be 

old level, the algorithm generates no overhead— an quite accurate in performance prediction. The approach 

important and novel advantage over the prior art. characterized the iteration between the various queues 

In the prior art, such as reported in Wah's research in terms of their steady-state occupancy probabilities. 

[Wan 85], the priority level of the load-balancing pro- 65 Iteration was then used to update estimates of the inter- 

cess is assumed to be lower than that of regular jobs, action. To account for the load from other nodes, the 

thereby preventing the resource-sharing procedure inventors divided by the expected number of nodes that 

from inhibiting normal operation. In direct contrast to have more than one job. The findings were as follows. 
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When the last job is completed and the node is about 
to leave state 1, node m always probes all other nodes. 
There are three detailed procedures that take place 
during this state transition: 

1) The last local job finishes. 5 

2) Node m probes all the other nodes to check 
whether any other node has more than one job. Note, 
there is a probability that at least one of the remote 
nodes has one or more jobs waiting in queue for service. 

3) If a remote node has more than one job, the algo- 10 
rithm transfers one job from the remote node to the 
local node and the node remains in state 1. There is a 
probability that node m cannot find an overloaded node 
and therefore node m transfers to state 0 with the ser- 
vice rate s m - 15 

While in an idle state, node m "wakes up" frequently 
at the average wakeup rate r» In all cases, after 
wakeup, this idle node should be able to poll a job from 
a remote node with the exception of the case where all 
other nodes are either idle or only have one job. 20 

The inventors saw that the transition rates, and hence 
the steady state solution, for node m depended on the 
steady state probabilities of the other queues in the 
system. Iteration was used to produce a, hopefully, 
close approximate result. From the state-transition dia- 25 
gram, the inventors calculated the steady-state probabil- 
ities, and found that the results verified the expected 
performance of the algorithm embodied in the present 
invention. 

We claim: 30 

1. In a distributed heterogeneous computer system 
having a plurality of computer nodes each operatively 
connected through a network interface to a network to 
provide for communications and transfers of data be- 
tween the nodes and wherein the nodes each have a 35 
queue for containing jobs to be performed, the improve- 
ment for dynamically reallocating the system's re- 
sources for optimized job performance comprising: 

a) means at each node for dynamically and periodi- 
cally calculating and saving a workload value as a 40 
function of the number of jobs on the node's queue; 

b) means at each node for transfering the node's said 
workload value to other nodes on the network at 
the request of said other nodes; and, 

c) means at each node operable at the completion of 45 
each job, 

cl) for checking the node's own said workload 
value, 

c2) for polling all the other nodes for their said 
workload value if the checking node's said work- 
load value is below a pre-established value indi- 
cating the node as being underutilized and avail- 
able to do more jobs, 

c3) for checking the said workload values of the 
other nodes as received, and 

c4) for transfering a job from the queue of the other 
of the nodes having the highest said workload 
value over a pre-established value indicating said 
other of the nodes as being overburdened and 
requiring job relief to the queue of the checking 60 
node. 

2. The improvement to a distributed heterogeneous 
computer system of claim 1 and additionally comprising 
means at each node periodically operable when the 
node is idle: 65 

a) for checking the node's own said workload value; 

b) for polling all the other nodes for their said work- 
load value if the checking node's said workload 



50 



55 



value is below a pre-established value indicating 
the node as being underutilized and available to do 
more jobs; 

c) for checking the said workload values of the other 
nodes as received; and, 

d) for transfering a job from the queue of the other of 
the nodes having the highest said workload value 
over a pre-established value indicating said other of 
the nodes as being overburdened and requiring job 
relief to the queue of the checking node. 

3. The improvement to a distributed heterogeneous 
computer system of claim 1 wherein: 

said means at each node for dynamically and periodi- 
cally calculating and saving a workload value as a 
function of the number of jobs on the node's queue 
comprises means for dividing the number of jobs 
on the node's queue by the service rate of the node. 

4. The improvement to a distributed heterogeneous 
computer system of claim 1 wherein: 

a) the jobs to be performed by the nodes are assigned 
priority levels; and, 

b) said polling of all the other nodes for their said 
workload value by a node is accomplished by the 
node as a job at the highest priority level. 

5. A distributed heterogeneous computer system hav- 
ing dynamic resource allocation comprising: 

a) network means for providing a communications 
path for computer; 

b) a plurality of computer nodes each operatively 
connected through a network interface to said 
network means whereby to communicate and 
transfer data between said nodes, said nodes each 
having a queue for containing jobs to be per- 
formed; 

c) means at each said node for dynamically and peri- 
odically calculating and saving a workload value as 
a function of the number of jobs on said node's 
queue; 

d) means at each node for transfering said node's said 
workload value to other nodes over said network 
at the request of said other nodes; and, 

e) means at each node operable at the completion of 
each job, 

el) for checking said node's own said workload 
value, 

e2) for polling all the other said nodes for their said 
workload value if the checking node's said work- 
load value is below a pre-established value indi- 
cating said node as being underutilized and avail- 
able to do more jobs, 

e3) for checking the said workload values of the 
other said nodes as received, and 

e4) for transfering a job from said queue of the 
other of said nodes having the highest said , work- 
load value over a pre-established value indicat- 
ing said other of said nodes as being overbur- 
dened and requiring job relief to said queue of 
the checking node. 

6. The distributed heterogeneous computer system of 
claim 5 and additionally comprising means at each said 
node periodically operable when said node is idle: 

a) for checking said node's own said workload value; 

b) for polling all the other said nodes for their said 
workload value if the checking node's said work- 
load value is below a pre-established value indicat- 
ing said node as being underutilized and available 
to do more jobs; 
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c) for checking the said workload values of the other 
said nodes as received; and, 

d) for transfering a job from said queue of the other of 
said nodes having the highest said workload value 
over a pre-established value indicating said other of 
said nodes as being overburdened and requiring job 
relief to said queue of the checking node. 

7. The distributed heterogeneous computer system of 
claim 5 wherein: 

said means at each node for dynamically and periodi- 
cally calculating and saving a workload value as a 
function of the number of jobs on said node's queue 
comprises means for dividing the number of jobs 
on said node's queue by the service rate of said 
node. 

8. The improvement to a distributed heterogeneous 
computer system of claim 5 wherein: 

a) the jobs to be performed by said nodes are assigned 
priority levels; and, 

b) said polling of all the other nodes for their said 
workload value by a node is accomplished by said 
node as a job at the highest priority level. 

9. In a distributed heterogeneous computer system 
having a plurality of computer nodes each operatively 
connected through a network interface to a network to 
provide for communications and transfers of data be- 
tween the nodes and wherein the nodes each have a 
queue for containing jobs to be performed, the method 
of operation for dynamically reallocating the system's 
resources for optimized job performance comprising 
the steps of: 

a) at each node, dynamically and periodically calcu- 
lating and saving a workload value as a function of 35 
the number of jobs on the node's queue; 

b) transfering the node's workload value to other 
nodes on the network at the request of the other 
nodes; and, 

c) at each node at the completion of each job, 40 
cl) checking the node's own workload value, 
c2) polling all the other nodes for their workload 

value if the checking node's workload value is 
below a pre-established value indicating the 
node as being underutilized and available to do 45 
more jobs, 

c3) checking the workload values of the other 
nodes as received, and 

c4) transfering a job from the queue of the other of 
the nodes having the highest the workload value 
over a pre-established value indicating the other 
of the nodes as being overburdened and requir- 
ing job relief to the queue of the checking node. 

10. The method of operating a distributed heteroge- 
neous computer system of claim 9 and when the node is 
idle additionally comprising the steps of: 

checking the node's own workload value; 
b) polling all the other nodes for their workload value 
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nodes as being overburdened and requiring job 
relief to the queue of the checking node. 

11. The method of operating a distributed heteroge- 
neous computer system of claim 9 wherein said step of 
dynamically and periodically calculating and saving a 
workload value as a function of the number of jobs on 
the node's queue comprises the step of: 

dividing the number of jobs on the node's queue by 
the service rate of the node. 

12. The method of operating a distributed heteroge- 
neous computer system of claim 9 wherein the jobs to 
be performed by the nodes are assigned priority levels 
and: 

said step of polling of all the other nodes for their 
workload value by a node is accomplished by the 
node as a job at the highest priority level. 

13. In a distributed heterogeneous computer system 
having a plurality of computer nodes each operatively 
connected through a network interface to a network to 
provide for communications and transfers of data be- 
tween the nodes and wherein the nodes each have a 
queue for containing jobs to be performed, the improve- 
ment for dynamically reallocating the system's re- 
sources for optimized job performance comprising: 

a) first logic means at each node for dynamically and 
periodically calculating and saving a workload 
value as a function of the number of jobs on the 
node's queue; 

b) second logic means at each node for transfering the 
node's said workload value to other nodes on the 
network at the request of said other nodes; and, 

c) third logic means at each node operable at the 
completion of each job, said third logic means 
including, 

cl) means for checking the node's own said work- 
load value, 

c2) means for polling all the other nodes for their 
said workload value if the checking node's said 
workload value is below a pre-established value 
indicating the node as being underutilized and 
available to do more jobs, 
c3) means for checking the said workload values of 

the other nodes as received, and 
c4) means for transfering a job from the queue of 
the other of the nodes having the highest said 
workload value over a pre-established value 
indicating said other of the nodes as being over- 
burdened and requiring job relief to the queue of 
the checking node. 

14. The improvement to a distributed heterogeneous 
computer system of claim 13 wherein: 

said third logic means is also operable periodically 
when the node is idle. 

15. The improvement to a distributed heterogeneous 
computer system of claim 13 wherein: 

said first logic means at each node comprises means 
for dividing the number of jobs on the node's queue 
by the service rate of the node. 
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16. The improvement to a distributed heterogeneous 
if the checking node's the workload value is below w com p U tcr system of claim 13 wherein the jobs to be 
a pre-established value indicating the node as being performed by the nodes are assigned priority levels and 
underutilized and available to do more jobs; wherein additionally: 



c) checking the workload values of the other nodes as 
received; and, 

d) transfering a job from the queue of the other of the 65 
nodes having the highest workload valuie over a 
pre-established value indicating the other of the 



said means for polling of all the other nodes for their 
said workload value by a node of said third logic 
means includes means for accomplishing said pol- 
ling as a job at the highest priority level. 
***** 
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